Fix inlining behaviour at the NVVM IR level by gmarkall · Pull Request #246 · NVIDIA/numba-cuda

gmarkall · 2025-05-09T12:02:56Z

PR #181 aimed to align the behaviour of the inline kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level.

Because the default value of inline is "never", this was interpreted by the compile_cuda() function as a Trueish value and every device function got marked with the alwaysinline function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in nvvmCompileProgram().

To rectify these issues, we add the forceinline kwarg to the @cuda.jit decorator and the cuda.compile[_*]() functions. Now, compile_cuda() will only enable inlining at the NVVM IR level for forceinline and not inline. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the inline and forceinline kwargs to clarify the intent and behaviour for users.

For clarity: the behaviour is now:

The inline kwarg enables inlining only at the Numba IR level.
The forceinline kwarg enables inlining only at the NVVM IR level.

PR NVIDIA#181 aimed to align the behaviour of the `inline` kwarg with that of upstream Numba, in that it now forces inlining at the Numba IR level. It turns out that this kwarg in Numba-CUDA already had the prior effect of enabling inlining at the NVVM IR level. Because the default value of `inline` is `"never"`, this was interpreted by the `compile_cuda()` function as a `True`ish value and every device function got marked with the `alwaysinline` function attribute. This is a minor problem in that it probably forces a lot of inlining that we don't want, but also a major problem in that it triggers an NVVM bug that was only resolved in CUDA 12.3 that causes a hang in `nvvmCompileProgram()`. To rectify these issues, we add the `forceinline` kwarg to the `@cuda.jit` decorator and the `cuda.compile[_*]()` functions. Now, `compile_cuda()` will only enable inlining at the NVVM IR level for `forceinline` and not `inline`. This is aligned with the behaviour of upstream Numba (see numba/numba#10068). We now document the `inline` and `forceinline` kwargs to clarify the intent and behaviour for users. For clarity: the behaviour is now: - The `inline` kwarg enables inlining only at the Numba IR level. - The `forceinline` kwarg enables inlining only at the NVVM IR level.

- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)

- Fix inlining behaviour at the NVVM IR level (#246 / #247)

ZzEeKkAa · 2025-05-09T16:01:23Z

numba_cuda/numba/cuda/dispatcher.py

            self.argtypes,
            debug=self.debug,
            lineinfo=lineinfo,
-            inline=inline,


Just double checking, since it is happening on LLVM IR level there is no need to pass inline, because it affects only Numba IR?

Yeah, exactly - inline affects online Numba IR, forceinline affects only LLVM IR (which is how it should be when all is working correctly)

ZzEeKkAa · 2025-05-09T16:04:53Z

numba_cuda/numba/cuda/decorators.py

+       ``"always"``. See `Notes on Inlining
+       <https://numba.readthedocs.io/en/stable/developer/inlining.html>`_.
+    :type inline: str
+    :param forceinline: Enables inlining at the NVVM IR level when set to


Do we want it to be bool, or we want to expose llvm IR attributes directly:
https://llvm.org/docs/LangRef.html#function-attributes

I want it to be bool so it's consistent with Numba. There is a plan for Numba to expose LLVM attributes directly, so I'll aim to align with that in future.

ZzEeKkAa

LGTM, few comments!

Thank you for fixing it. Did not mean to break inlining with the original MR. There was luck of docs that explains that behavior...

gmarkall · 2025-05-09T16:08:07Z

There was luck of docs that explains that behavior...

Not only that, but I didn't even know inline was working in any sense at all... I thought it was one of those things that was perpetually (or for a really long time) broken / a no-op.

gmarkall · 2025-05-09T16:10:19Z

@ZzEeKkAa Many thanks for the review - I've responded to the comments - let me know if I should follow up any further!

gmarkall · 2025-05-09T20:25:57Z

I'll merge this a little later today, as it seems to have been the resolution to RAPIDS hanging (in rapidsai/cudf#18688) in the v0.10.1 release - assuming there are no objections?

gmarkall requested review from ZzEeKkAa and brandon-b-miller May 9, 2025 12:03

gmarkall added the 3 - Ready for Review Ready for review by team label May 9, 2025

brandon-b-miller approved these changes May 9, 2025

View reviewed changes

gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request May 9, 2025

Bump version to 0.10.1

51d32d1

- Fix inlining behaviour at the NVVM IR level (NVIDIA#246 / NVIDIA#247)

gmarkall mentioned this pull request May 9, 2025

Bump version to 0.10.1 #248

Merged

gmarkall added a commit that referenced this pull request May 9, 2025

Bump version to 0.10.1 (#248)

6ca135b

- Fix inlining behaviour at the NVVM IR level (#246 / #247)

ZzEeKkAa reviewed May 9, 2025

View reviewed changes

ZzEeKkAa approved these changes May 9, 2025

View reviewed changes

gmarkall added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review Ready for review by team labels May 9, 2025

gmarkall merged commit 3a3ff78 into NVIDIA:main May 9, 2025
37 checks passed

gmarkall added 5 - Ready to merge Testing and reviews complete, ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels May 9, 2025

brandon-b-miller mentioned this pull request May 9, 2025

Bump version to 0.11.0 #250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inlining behaviour at the NVVM IR level#246

Fix inlining behaviour at the NVVM IR level#246
gmarkall merged 1 commit intoNVIDIA:mainfrom
gmarkall:fix-inlining-2

gmarkall commented May 9, 2025

Uh oh!

ZzEeKkAa May 9, 2025

Uh oh!

gmarkall May 9, 2025

Uh oh!

ZzEeKkAa May 9, 2025

Uh oh!

gmarkall May 9, 2025

Uh oh!

ZzEeKkAa left a comment

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gmarkall commented May 9, 2025

Uh oh!

ZzEeKkAa May 9, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall May 9, 2025

Choose a reason for hiding this comment

Uh oh!

ZzEeKkAa May 9, 2025

Choose a reason for hiding this comment

Uh oh!

gmarkall May 9, 2025

Choose a reason for hiding this comment

Uh oh!

ZzEeKkAa left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

gmarkall commented May 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants